interpretable clustering
Interpretable Clustering: A Survey
Hu, Lianyu, Jiang, Mudi, Dong, Junjie, Liu, Xinying, He, Zengyou
In recent years, much of the research on clustering algorithms has primarily focused on enhancing their accuracy and efficiency, frequently at the expense of interpretability. However, as these methods are increasingly being applied in high-stakes domains such as healthcare, finance, and autonomous systems, the need for transparent and interpretable clustering outcomes has become a critical concern. This is not only necessary for gaining user trust but also for satisfying the growing ethical and regulatory demands in these fields. Ensuring that decisions derived from clustering algorithms can be clearly understood and justified is now a fundamental requirement. To address this need, this paper provides a comprehensive and structured review of the current state of explainable clustering algorithms, identifying key criteria to distinguish between various methods. These insights can effectively assist researchers in making informed decisions about the most suitable explainable clustering methods for specific application contexts, while also promoting the development and adoption of clustering algorithms that are both efficient and transparent.
- Asia > China > Liaoning Province > Dalian (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
Interpretable Clustering with the Distinguishability Criterion
Cluster analysis is a popular unsupervised learning tool used in many disciplines to identify heterogeneous sub-populations within a sample. However, validating cluster analysis results and determining the number of clusters in a data set remains an outstanding problem. In this work, we present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations. Our computational implementation of the Distinguishability criterion corresponds to the Bayes risk of a randomized classifier under the 0-1 loss. We propose a combined loss function-based computational framework that integrates the Distinguishability criterion with many commonly used clustering procedures, such as hierarchical clustering, k-means, and finite mixture models. We present these new algorithms as well as the results from comprehensive data analysis based on simulation studies and real data applications.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Europe > Middle East (0.04)
- Asia > Middle East (0.04)
- (5 more...)
Optimal Decision Trees For Interpretable Clustering with Constraints (Extended Version)
Shati, Pouya, Cohen, Eldan, McIlraith, Sheila
Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has considered exact optimization formulations that can guarantee optimal clustering while satisfying all constraints, however these approaches lack interpretability. Recently, decision-trees have been used to produce inherently interpretable clustering solutions, however existing approaches do not support clustering constraints and do not provide strong theoretical guarantees on solution quality. In this work, we present a novel SAT-based framework for interpretable clustering that supports clustering constraints and that also provides strong theoretical guarantees on solution quality. We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints. Our framework is the first approach for interpretable and constrained clustering. Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.63)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.63)
Interpretable Clustering via Optimal Trees
Bertsimas, Dimitris, Orfanoudaki, Agni, Wiberg, Holly
State-of-the-art clustering algorithms use heuristics to partition the feature space and provide little insight into the rationale for cluster membership, limiting their interpretability. In healthcare applications, the latter poses a barrier to the adoption of these methods since medical researchers are required to provide detailed explanations of their decisions in order to gain patient trust and limit liability. We present a new unsupervised learning algorithm that leverages Mixed Integer Optimization techniques to generate interpretable tree-based clustering models. Utilizing the flexible framework of Optimal Trees [1], our method approximates the globally optimal solution leading to high quality partitions of the feature space. Our algorithm, can incorporate various internal validation metrics, naturally determines the optimal number of clusters, and is able to account for mixed numeric and categorical data. It achieves comparable or superior performance on both synthetic and real world datasets when compared to K-Means while offering significantly higher interpretability.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
- Oceania > Australia > Queensland > Townsville (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)